Short-time phase spectrum in speech processing: A review and some experimental results
نویسندگان
چکیده
Incorporating information from the short-time phase spectrum into a feature set for automatic speech recognition (ASR) may possibly serve to improve recognition accuracy. Currently, however, it is common practice to discard this information in favour of features that are derived purely from the short-time magnitude spectrum. There are two reasons for this: (1) the results of some well-known human listening experiments have indicated that the short-time phase spectrum conveys a negligible amount of intelligibility at the small window durations of 20–40 ms used for ASR spectral analysis, and (2) using the short-time phase spectrum directly for ASR has proven difficult from a signal processing viewpoint, due to phase-wrapping and other problems. In this article, we explore the possibility of using short-time phase spectrum information for ASR by considering the two points mentioned above. To address the first point, we review the results of our own set of human listening experiments. Contrary to previous studies, our results indicate that the short-time phase spectrum can indeed contribute significantly to speech intelligibility over small window durations of 20–40 ms. Also, the results of these listening experiments, in addition to some ASR experiments, indicate that at least part of this intelligibility may be supplementary to that provided by the short-time magnitude spectrum. To address the second point (i.e., the signal processing difficulties), we suggest that it may be necessary to transform the shorttime phase spectrum into a more physically meaningful representation from which useful features could possibly be extracted. Specifically, we investigate the frequency-derivative (or group delay function, GDF) and the time-derivative (or instantaneous frequency distribution, IFD) as potential candidates for this intermediate representation. We review our recent work, where we have performed various experiments which show that the GDF and IFD may be useful for ASR. In our recent work, we have also conducted several ASR experiments to test a feature set derived from the GDF. We found that, in most cases, these features perform worse than the standard MFCC features. Therefore, we suggest that a short-time phase spectrum feature set may ultimately be derived from a concatenation of information from both the GDF and IFD representations. For best performance, the feature set may also need to be concatenated with short-time magnitude spectrum information. Further to addressing the two aforementioned points, we also discuss a number of other speech applications in which the shorttime phase spectrum has proven to be very useful. We believe that an appreciation for how the short-time phase spectrum has been used for other tasks, in addition to the results of our own experiments, will provoke fellow researchers to also investigate its potential for use in ASR. © 2006 Elsevier Inc. All rights reserved.
منابع مشابه
Auditory processing skills in brainstem level of autistic children: A Review Study
Aims: Autism is a pervasive developmental disorder. Deficit in sensory functions is one of the characteristics of people with autism, and usually these people show abnormality in processing and correct interpretation of auditory information. Also people with Autism show problems in communicating with others. This review article deals with the accurate understanding of Auditory processing skills...
متن کاملIterative reconstruction of speech from short-time Fourier transform phase and magnitude spectra
In this paper, we consider the topic of iterative, one dimensional, signal reconstruction (specifically speech signals) from the magnitude spectrum and the phase spectrum. While this topic has been extensively researched and documented, we wish to recast some well-established results for the benefit of new researchers and those who desire a short, yet comprehensive, review of the subject. The t...
متن کاملTime-frequency distributions for automatic speech recognition
The use of general time-frequency distributions as features for automatic speech recognition (ASR) is discussed in the context of hidden Markov classifiers. Short-time averages of quadratic operators, e.g., energy spectrum, generalized first spectral moments, and short-time averages of the instantaneous frequency, are compared to the standard front end features, and applied to ASR. Theoretical ...
متن کاملUsefulness of Phase in Speech Processing
It is a common belief in the speech community that the short-time phase spectrum plays very little (or, no) role in human perception tasks as well as in automatic speech recognition systems. In this paper, the usefulness of phase information is explored in human speech perception as well as in automatic speech recognition. Through human perception experiments, it is shown that the short-time ph...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Digital Signal Processing
دوره 17 شماره
صفحات -
تاریخ انتشار 2007